Differential Evolution for Many-Particle Adaptive Quantum Metrology 
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We devise powerful algorithms based on differential evolution for adaptive many-particle quantum 
metrology. Our new approach delivers adaptive quantum metrology policies for feedback control 
that are orders-of-magnitude more efficient and surpass the few-dozen-particle limitation arising in 
methods based on particle-swarm optimization. We apply our method to the binary-decision-tree 
model for quantum-enhanced phase estimation as well as to a new problem: a decision tree for 
adaptive estimation of the unknown bias of a quantum coin in a quantum walk and show how this 
latter case can be realized experimentally. 
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Quantum- enhanced metrology (QEM) aims to achieve 
single-shot parameter estimation for Hamiltonian- 
generated evolution of TV particles with a degree of im- 
precision A^v (e.g., standard deviation) exceeding the 
semiclassical limit (or "standard quantum limit"). This 
limit is due to particle partition noise (vacuum fluctua- 
tions) [l|] and ultimately restricts the precision of clocks 
[2| , gravitational wave detection |3| and adaptive Hamil- 
tonian identification [4|. Mathematically, An <G O (N~ p ) 
with N the number of particles in the probe "pulse" 
(our term for a collection of particles, e.g., photons) 
and p = 1/2 (a = 1) in the semiclassical (ultimate) pre- 
cision limit [5|47| . The objective of single-shot QEM is to 
attain precision exceeding p = 1/2 and reaching as close 
as possible to p = 1 for a single "pulse" , as opposed to to- 
mography where many "pulses" could be used. Two com- 
mon QEM strategies inject quantum-resource-laden (e.g., 
entangled) input states (i) followed by multi-particle joint 
measurement or (ii) our focus: adaptive QEM (AQEM), 
which employs only local measurements each followed by 
optimal control of system parameters in order to extract 
maximal information about unknown parameters [8[. 

Finding effective adaptive-feedback procedures (known 
as "policies" in machine learning) is typically intractable 
but facilitated by decision-tree learning [{J |lOj. Here 
we report three major new advances in AQEM enabled 
via our introduction of differential-evolution (DE) [11| 
decision-tree learning to AQEM: (a) surpassing the few- 
dozen-particle limit in previous interferometric-phase- 
cstimation studies [9|, [l0( explained and depicted in 
Fig- HI (b) advancing beyond the binary decision tree 
for quantum-walk coin-bias parameter estimation; and 
(c) showing how our learning algorithm can be used in op- 
tical quantum experiments with current technology [12j. 
We introduce DE as a tool for AQEM because of its 
known superiority over the Particle Swarm Optimiza- 
tion (PSO) machine-learning algorithm 



FIG. 1: (Color online) A set of (possibly entangled) particles, 
solid (red) circles on the LHS, are injected into a system with 
unknown parameter <f>. Information from sequential measure- 
ments on each outgoing particle, faded (red) circle on the 
RHS, is fed to a processing unit (PU) to modify a control pa- 
rameter $ to enhance the precision of estimating (/>■ Machine 
learning is used on training sets to find a suitable decision- 
tree-based algorithm for the PU so that single-shot estimates 
of 6 beat the semiclassical measurement limit. 



13j | for many 



optimization problems, especially for high-dimensional 



search spaces |1J,|15(, hence appropriate for AQEM. (We 
immediately see an ambiguity of terminology: the par- 
ticle traversing the interferometer is different from the 
machine-learning particle, which is a test function in a 
search space. As particle is a common term in both quan- 
tum physics and machine learning, we will use the term 
in both ways and make the term clear through context.) 
Whereas previous work [9|, [l(| demonstrated that swarm 
(collection of particles) intelligence yields AQEM algo- 
rithms superior to algorithms so far devised by sentient 
beings (i.e., humans), our use of an evolutionary algo- 
rithm here goes beyond an in-principle demonstration of 
artificial intelligence for AQEM towards a realistic ap- 
proach to devising algorithms for many-particle systems. 

PSO is inspired by a social-behavior model compris- 
ing S individual "particles" stochastically searching a 
vector space punctuated by T iterations of mutual com- 
munication and collective-intelligence decisions to cir- 



cumvent local-minima traps. The PSO AQEM algorithm 
employs a highly effective logarithmic-search heuristic 
to devise policies for single-shot AQEM interferomet- 
ric phase estimation |9|, [l0(. Here a policy is defined 
to be a procedure that an "agent" , representing the 
feedback loop, adopts given a set of measurement re- 
sults for a subset of particles in the output pulse. A 
good policy, namely, a policy that beats the semiclassi- 
cal measurement limit, was previously obtained with a 
computational-space overhead 0(N) accompanied by a 
run-time cost 0(N 6 ) [lOj. Here we show that this afore- 
mentioned PSO-based algorithm breaks down for just 
dozens of particles, but we remedy this limitation here by 
switching the learning algorithm from PSO to DE (which 
we show dramatically speeds up the simulation run-time) 
but pay a time-cost slight penalty, namely 0(N 7 ) in- 
stead of the previous 0(N 6 ), thereby surpassing the pre- 
vious maximum-number-of-particles barrier to devising 
policies. 

Our employment of a DE AQEM algorithm also en- 
ables us to go beyond the restrictive binary-outcome 
measurement model for two-output-port interfcromctry. 
We introduce an example of a single-shot AQEM prob- 
lem with a higher number of possible measurement out- 
comes hence a larger d— ary tree. Specifically we now 
solve the harder case of a discrete-time quantum walk 
with N walkers (effectively a "pulse" of walkers) and a 
quantum-coin operator that has an unknown bias. The 
AQEM objective is to ascertain the quantum coin's bias 
with an imprecision that scales better than semiclassical 
limit p = 1/2. As a position measurement of the walker 
at time t yields an outcome in {— i, . . . , £}, the resultant 
decision tree is <i-ary for d = 2t + l. Our strategy is to re- 
place the d oc t tree by a quaternary (d = 4) decision tree 
and show the effectiveness of DE for finding a policy that 
beats the semiclassical limit. Furthermore we propose a 
feasible quantum optical quantum-walk experiment that 
can attain the semiclassical limit and potentially beat it 
by exploiting entangled photons. 

Let us now establish a mathematically rigorous AQEM 
model. In the lossless, decoherence-free case, an N- 



particle input "pulse" state 
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1 J#l is acted on se- 



quentially particle-by-particle by a device with unknown 
parameter </>, which could be a multicomponent vector d), 
according to T>(<f>; £>j) : Jfi —> M\ with $; a control pa- 
rameter (possible a multi-component vector as well) that 
is modified according to the measurement history on pre- 
vious particles. Each P-transformed particle is measured 
according to Ai : J#i —$■ Oi for O a set of measurement 
outcomes. For the interferometer Oi = {0, 1}; for the 
quantum walk Oi = {— i, . . . ,i}. Although T> generically 
has a 2 N x 2 N representation, this reduces to N x N for 
a permutationally-symmetric input state \ip) [16(. The 
sequence $ = {<&i} is the policy for controlling the in- 
terferometer in order to extract a measurement of <f> with 
low imprecision A at. Our aim is to devise an efficient al- 



gorithm that delivers a fit policy 4? such that A^ scales 
better than p — 1/2, and each policy is a test function, 
or particle, in the machine-learning procedure. 

Our policy-devising algorithm, which uses machine 
learning, has the following inputs: number of particles N, 
permutationally-symmetric input state \ip) £ PC N+1 , a 
prior probability distribution V for the unknown sys- 
tem parameter <fi (typically uniform) , the device operator 
T>{4>) £ C N+1 x C N+1 , the set of projectors 11; for each 
i th particle (|j)(j| for j £ {0,1} in the interferometer 
case and j £ {—i, ■ ■ ■ ,i} for the quantum-walk case), an 
integer I to determine which machine-learning algorithm 
to use such as PSO or DE, the number S of particles, 
or "chromosomes" in DE parlance, number T of itera- 
tions, the fitness functional J- that assesses the precision 
guaranteed by executing the policy, and the maximum 
number f2 of repetitions the machine-learning algorithm 
is permitted to run before aborting. From the multitude 
of available machine learning techniques, we compare the 
two powerful cases of PSO and DE to devise policies that 
deliver AQEM parameter estimation. 

The PSO algorithm is based on having multiple par- 
ticles undergoing independent stochastic searches inter- 
rupted by periodic iterations of communication between 
overlapping logarithmic-sized neighborhoods of particles 
that tend to steer these particles depicted in Fig. [2ja), 
towards superior policy regions of the vector space. Sim- 
ilarly DE also employs multiple policies undergoing inde- 
pendent stochastic searches. Instead of interruptions by 
rounds of communication and steering, DE is interrupted 
by a cross-over breeding between the original chromo- 
some and a hybrid of three randomly chosen chromo- 
somes from the remaining set of policies. The fittest of 
the original vs the cross-over of the original with the hy- 
brid is retained for the next round; see Fig. EJb). 

Algorithmic specific inputs for PSO are exploration 
weight a, exploitation weight /?, velocity clamping v and 
inertial weight uj. For DE, the algorithmic inputs are 
mutation scaling \x and cross-over rate 7. In order to 
perform a fair comparison between PSO- and DE-based 
adaptive policy-devising algorithms, we ensure that all 
common input parameters are identical and parameters 
specific to PSO or DE arc optimized. Now we consider 
how to make the policy-devising algorithm efficient and 
also determine the space and time complexities. We re- 
duce the space complexity by employing a logarithmic- 
search heuristic that parametrizes the decision tree only 
by its depth, and the depth equals N implying a space 

cost O(N) ja|. 

We develop heuristics to ensure a polynomial time cost: 
(i) simulating the interferometer for a single Af-particle 
pulse is 0(N 2 ) [16(§4.2; (ii) iterating the search steps T £ 
O(N), which is higher than previous studies that set T £ 
O(l) but enables breaking the fcw-dozen-particlc limit in 
that work @,[I3] for the the DE case but not for the PSO 




FIG. 2: (Color online) (a) Pictorial representation of the PSO 
algorithm. Each particle, represented here by a bird, stores its 
current position (solid (red) circle) in the search space (actu- 
ally parameters of the decision tree), its best previous position 
(top left (green arrow)) and the best neighbour position (bot- 
tom right (blue arrow)). Each bird undergoes simultaneous 
velocity vector updates (to the dashed (red) circle) according 
to three terms: an inertial term limiting the change in velocity 
plus two terms that rescales and redirects the velocity to its 
own personal best and to the best bird in the neighbourhood, 
respectively, (b) Pictorial representation of the DE algorithm. 
Each chromosome is a vertical block (of decision-tree param- 
eters) and initialized to random values in the search space 
(top left). For each chromosome, three random chromosomes 
are chosen to be parents of a donor chromosome comprising 
random data from each parent (top right). This donor chro- 
mosome is crossed randomly with the original chromosome to 
create a trial chromosome (bottom left), which is compared 
with the original (bottom right), and the fitter chromosome 
is retained for the next iteration. The dashed line represents 
a single iteration of the differential algorithm. 
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FIG. 3: (Color online) Imprecision A of (a) the interferomet- 
ric phase and (b) the quantum-walk coin bias for the semiclas- 
sical (uppermost dotted line) and ultimate quantum-limited 
(lowest dashed line) cases DE (straight green middle line)), 
PSO (blue middle line that tilts upward for large TV). The 
PSO- and DE-based plots each required (a) 315 and (b) 403 
CPU-hours on a cluster of 100 parallel cores each running at 
2.66GHz and show p = 0.74. 



case as shown in Fig. [Ha); (iii) assessing a candidate 
policy from K 6 0(N 2 ) samplings [lOj; (iv) repeating 
for each of the 5 € O(N) particles; and (v) constructing 
a starting distribution from the (TV — l)-particle policy 
with concomitant time cost <G O(N). The TV-particle 
policy imprecision Ajv, determined from the preceding 
fittest (TV — l)-particlc policy has a ratio Ajv/Ajv_i = 
1 — 1/TV, which necessitates ft G O(N) repetitions of the 
algorithm. 

The adaptive interferometric phase-estimation 
algorithm commenced with initial (unnormalized 
multi-particle two-mode entangled state [9j, 
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with d\ . a reduced rotation matrix element \u\ , and 
fitness function for the phase-error distribution as 
I f* P(C)e^dCI with ( the absolute difference between 
inferred and correct phase in a training set. We see 
in Fig. [3ja) that the adaptive intcrfcromctric-phasc- 
estimation policies found using DE surpass those found 
using PSO in that they maintain the power-law scaling 
(better than semiclassical limit) past the few-dozen 
particle-number limit. We are able to simulate up to 98 
particles in the input state with no sign of breakdown. 
As we use the same space and time resources for the 
PSO- and DE-based algorithms, an improvement from 
simulating up to 45 particles in the former algorithm 
compared to 98 particles in the latter corresponds to a 
(f|) 7 sa 232-fold effective increase in run-time. 

As DE is so much more powerful than PSO for adap- 
tive quantum metrology, we consider solving a signifi- 
cantly more challenging AQEM problem, specifically es- 
timating the bias </> of a quantum- walk coin [l8j. The 
key challenge is due to the larger number of measure- 
ment outcomes than just two for interferometric phase 
estimation. For each walker, the walker-coin basis states 
at time t are {|a;,c) : x <G {— t, ...,(},c£ { — 1, 1}} with 
dimension d t = 2(2i + 1). Each quantum- walk step is a 
sequence of a coin flip C(0)|c) = y/4>\ — 1) + c\J\ — </>|l) 
and a conditional walker translation S\x, c) = \x + c, c). 
The step operation S (1 <g> C) is repeated t times. 

The procedure to estimate bias <f> is similar to esti- 
mating the interferometric phase in that a single pulse 
of sequential particles is injected to a quantum-walk ap- 
paratus of duration t where the particles in this case are 
quantum walkers. Unlike the interferometric case where 
each particle is equally likely to traverse each of two avail- 
able paths, here the bias causes an unequal split between 
multiple paths, in contrast to the classical case where 
the bias shifts the walker's distribution left or right, the 
quantum-biased coin alters the shape of the distribution. 

We assume an initial TV-walker input, adapted from the 
two- walker state [19( , such that the state is permutation- 
ally symmetric, in order to ensure algorithmic time cost 
0(TV 2 ) as in the interferometric-phasc case. Furthermore 
each walker's initial state is symmetrized with respect to 
the position around x = 0. The position distribution for 
the walker's reduced state (tracing over the coin state) 
becomes increasingly asymmetric due to the bias of the 
coin, and we introduce the skewness of this distribution 
(given that the quantum-coin bias alters the distribution 
shape) as the fitness parameter in the machine-learning 
algorithm. This machine-learning algorithm is part of an 
AQEM algorithm responsible for finding a fit feedback 
policy that determines how much to modify the coin's 
bias subsequent to each single-particle measurement. 

For estimating the coin bias, we introduce an effec- 
tive heuristic based on reducing the d-ary decision tree 



to a quaternary (d = 4) decision tree and maintain- 
ing the logarithmic search heuristic developed for the 
interferometric-phase case |9j. The reason for d ~ 4 be- 
gins with recognizing that the quantum walker's position 
distribution can be broken up into four regions given by 
left-outer, left-inner, right-inner and right-outer. As is 
well known for the coined quantum walk, the inner region 
of the position distribution contains 1/3 of the position 
probability and is approximately uniform. The outer re- 
gion contains the remaining 2/3 of the distribution and 
is highly peaked [2Cj ■ The skewness of the distribution 
is expected to show more strongly by comparing the left 
and right outer regions rather than restricting to the bi- 
nary case of comparing the entire left and right regions. 

We execute the policy-devising algorithm with this 
d = 4 heuristic with O(N) space cost and 0(N 7 ) time 
cost as before. Figure [3^b) shows imprecision An of 
policies found using PSO and DE with the scmiclassi- 
cal and ultimate quantum power-law limits for reference, 
where Ajv is the imprecision not of (j) but rather of t(f> 
because the biased coin operation has been executed 
t times. Specifically Fig. [3^b) shows power- law scaling 
for up to 35 walkers per pulse in the PSO case and fails 
beyond 35 walkers. Contrariwise the DE-based adap- 
tive metrology algorithm successfully determines policies 
that maintain power-law scaling up to 75 walkers with 
no sign of power-law breakdown. The resultant improve- 
ment from 35 to 75 walkers by using DE instead of PSO 
corresponds to a (|§) 7 ~ 208-fold decrease of effective 
run-time, which is comparable to the speed-up for the 
interferometric-phase case. 

AQEM for quantum walks is particularly exciting be- 
cause implementation is feasible with existing quantum 
optical quantum- walk experimental techniques 12( as we 
now show. In this approach quantum walkers are pho- 
tons, and the position degree of freedom is replaced by 
time of arrival. The coin state corresponds to the polar- 
ization state of the photon, and coin flips are executed 
using a half- wave plate (HWP), which transforms the po- 
larization to a superposition of the two polarizations that 
can be unequally weighted according to the angle 9 of the 
HWP relative to one of the polarization axes. Quantum- 
walk steps are implemented by having the photons cir- 
cumambulate an optical fiber loop as depicted in Fig. |U 
A 50:50 beamsplitter enables the photon to exit the loop 
leading to an avalanche photo diode (APD) where the 
position of the walker is realized temporally as an arrival 
time. Thus, there is only a 50% chance the photon will 
remain in the fiber loop and advance to the next step. 

In the model we propose, each walker performs t steps 
before being measured. Our modification to existing ex- 
periments is shown in Fig. |4j This modification replaces 
the 50:50 beamsplitter with an active switch into the de- 
tection fiber (as suggested earlier [12j). This switch al- 
lows for a controllable number t timestcps. The 'biased' 
coin is achieved using the HWP with an unknown an- 




FIG. 4: (Color online) Schematic of proposed adaptive quan- 
tum metrology experiment to determine the bias of the half- 
wave plate (HWP) A laser source (red star) is attenuated 
to the single-photon level. The field then passes through a 
polarizing beam splitter (PBS), then a HWP, followed by 
a quarter-wave plate (QWP) to prepare ('Prep') any initial 
walker state. The beam splitter is controlled by an active 
switch, which determines whether the photon re-enters the 
loop or is sent to the avalanche photo diode (APD) detector 
(after t steps). Once in the fiber network, the unknown bias 
coin operation is performed using a half wave plate with un- 
known angle 8. The adaptive coin operation is effected by 
another HWP with 8 modified by the processing unit (PU) 
subsequent to measurement of the previous walker's time of 
arrival. The photons then pass through a delay loop with 
PBSs on either side effecting the shift-in-time operation. 



gle 9. The adaptive coin operation is implemented by 
another HWP with the angle 9 controlled by a processing 
unit programmed with the specific feedback policy found 
by our algorithm. Our heuristic of grouping the mea- 
surement outcomes is accomplished by translating those 
groupings into arrival time bins. Thus our scheme could 
be implemented and used to obtain the semi-classical 
limit and possibly better if we can exploit entangled pho- 
tons. 

In summary we establish that DE is a powerful 
machine-learning tool for devising adaptive quantum- 
enhanced metrology policies and that our DE-based 
policy-devising algorithm significantly surpasses PSO for 
two important cases: adaptive interferometric-phase es- 
timation and estimating the bias of a quantum walker's 
coin. This latter case entails using a d-ary decision tree 
where d can be much greater than two, and we show 
that a d = 4 heuristic is effective even for large d. The 
power of the DE-based algorithm is evident in the fact 
that we double the number of particles solvable in a 
given computer time. Given the 0{N 7 ) run-time cost 
of the algorithms, this means that we have an effective 
run-time speed-up of approximately 2 7 over the previous 
best, namely the PSO-based algorithm. Moreover, our 
new DE-based algorithm shows no sign of power-law de- 
viation for double the number of particles compared to 
the PSO-based algorithm, which means not only is there 
a run-time speed-up but also that the policies show im- 
provement right up to the data point for the largest par- 



tide number. Finally we show that our adaptive quan- 
tum metrology policy-devising algorithm can be effected 
with current optical quantum-walk technology. Policies 
for quantum metrology in the presence of phase noise and 
decoherence of the multi-photon state are known using 
PSO [lOj, but DE algorithms for these conditions are a 
topic for future work. 
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